NCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation

نویسندگان

  • Yih-Ru Wang
  • Yuan-Fu Liao
چکیده

This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2014 evaluation. The system’s main components are still the conditional random field (CRF)-based word segmentation/part-ofspeech (POS) tagger and tri-gram language model (LM) used last year. But we tried to refine the misspelling rules, decision-making threshold and improve LM rescoring speed to reduce false alarm rate and improve rescoring speed. Bake-off 2014 evaluation results show that one of our system (Run2) did achieve reasonable performance with about 0.485/0.468 accuracies and 0.226/0.180 F1 scores in the detection/correction metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Spelling Check System Based on Tri-gram Model

This paper describes our system in the Chinese spelling check (CSC) task of CLP-SIGHAN Bake-Off 2014. CSC is still an open problem today. To the best of our knowledge, n-gram language modeling (LM) is widely used in CSC because of its simplicity and fair predictive power. Our work in this paper continues this general line of research by using a tri-gram LM to detect and correct possible spellin...

متن کامل

Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff

Chinese spelling check (CSC) is an essential issue in the research field of Chinese language processing (CLP). This paper describes the details of two CSC systems we developed to solve this problem. The first system was built based on CRF model, and the modules of such system include word segmentation, error detection and error correction. Another system was based on 2Chars&&3-Chars model, and ...

متن کامل

NTOU Chinese Spelling Check System in CLP Bake-off 2014

This paper describes details of NTOU Chinese spelling check system participating in CLP2014 Bakeoff. Confusion sets were expanded by using two language resources, Shuowen and Four-Corner codes. A new method to find spelling errors in legal multi-character words was proposed. Comparison of sentence generation probabilities is the main information for error detection and correction. A rulebased c...

متن کامل

Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check

This paper introduces a Chinese Spelling Check campaign organized for the SIGHAN 2014 bake-off, including task description, data preparation, performance metrics, and evaluation results based on essays written by Chinese as a foreign language learners. The hope is that such evaluations can produce more advanced Chinese spelling check techniques.

متن کامل

Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013

This paper introduces an overview of Chinese Spelling Check task at SIGHAN Bake-off 2013. We describe all aspects of the task for Chinese spelling check, consisting of task description, data preparation, performance metrics, and evaluation results. This bake-off contains two subtasks, i.e., error detection and error correction. We evaluate the systems that can automatically point out the spelli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014